biplotEZ

User-friendly biplots in R


Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za


StatCon 2023

2023-09-02

Outline

  • Session 1: Introduction to biplots
    • Introduction to biplots
    • PCA biplot
    • Getting started: biplotEZ
  • Session 2: Other types of biplots
    • Canonical Variate Analysis biplots
    • CVA biplot with biplotEZ

Introduction

  • The biplot is a powerful and very useful data visualisation tool. Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.
  • Consider an \(n\) x \(p\) data matrix containing numerical information on \(p\) variables for each of \(n\) samples.
  • The \(n\) samples can be represented in a two-dimensional scatter diagram, where the coordinate axes can represent any two variables.
  • Samples can also be represented in a three-dimensional diagram developed from Cartesian geometry and conceptually in multidimensional forms in which there are many coordinate axes to represent the \(n\) samples.

Scatterplot of 2 variables

Biplots

  • Although two-dimensional scatter diagrams are crucial for presenting data, multidimensional scatter diagrams tend not to be.
  • Statisticians therefore, developed methods for approximating multidimensional scatter diagrams into a biplot.
  • A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.
  • Gabriel (1971) introduced the biplot in which the samples are represented as \(n\) points and the variables are represented as \(p\) vectors relative to the same axes and origin.

Iris data

library(tibble)
tibble(iris)
# # A tibble: 150 × 5
#    Sepal.Length Sepal.Width Petal.Length
#           <dbl>       <dbl>        <dbl>
#  1          5.1         3.5          1.4
#  2          4.9         3            1.4
#  3          4.7         3.2          1.3
#  4          4.6         3.1          1.5
#  5          5           3.6          1.4
#  6          5.4         3.9          1.7
#  7          4.6         3.4          1.4
#  8          5           3.4          1.5
#  9          4.4         2.9          1.4
# 10          4.9         3.1          1.5
# # ℹ 140 more rows
# # ℹ 2 more variables: Petal.Width <dbl>,
# #   Species <fct>
summary(iris)
#   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
#  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
#  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
#  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
#  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
#  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
#        Species  
#  setosa    :50  
#  versicolor:50  
#  virginica :50  
#                 
#                 
# 

Biplot of Iris data

library(biplotEZ)
library(magrittr)
biplot(iris[,1:4],iris[,5]) |> PCA() |> plot()

Biplots

Considerations

  • the types of variables (quantitative, qualitative, ordinal, etc.);
  • the method used for displaying samples;
  • what the biplot will display, either for predicting a sample point or interpolating a new sample point.

For now the focus will be on (1) continuous data, (2) the method used to display samples through approximations of the data, and (3) predicting samples.

Constructing a biplot

Data: \({\bf{X}}\)

#          X1       X2        X3
# 1  5.418840 5.054240  8.711160
# 2  3.129920 1.783160  3.385920
# 3  6.128080 2.173200  8.173560
# 4  6.781120 4.753280  8.731640
# 5  7.346560 5.893200 11.303040
# 6  7.208200 3.744000 10.075760
# 7  7.039440 5.213640  8.608840
# 8  5.465720 4.492640  5.596520
# 9  7.723240 4.708120 11.357480
# 10 7.109560 4.987520  7.732840
# 11 8.135800 4.392000  9.264840
# 12 6.287480 5.908720  7.488240
# 13 4.648880 7.198280  8.573720
# 14 5.798600 6.120080  8.254840
# 15 8.084560 3.234840  8.966560
# 16 6.157773 5.743455  9.899045
# 17 3.556727 2.026318  3.847636
# 18 6.963727 2.469545  9.288136
# 19 7.705818 5.401455  9.922318
# 20 8.348364 6.696818 12.844364
# 21 8.191136 4.254545 11.449727
# 22 7.999364 5.924591  9.782773
# 23 6.211045 5.105273  6.359682
# 24 8.776409 5.350136 12.906227
# 25 8.079045 5.667636  8.787318

Constructing a biplot

Geometrically the rows of \({\bf{X}}\) are given as coordinates of \(n\) samples in the \(p\)-dimensional space \(\mathbb{R}^p\).

The aim is to seek an \(r\)-dimensional plane that contains the points whose coordinates are given by the rows of \({\bf{\hat{X}}}_{[r]}\) which minimises a least squares criterion given by, \[\begin{equation} || {\bf{X}} - {\bf{\hat{X}}}_{[r]}||^2 = tr\{({\bf{X}} - {\bf{\hat{X}}}_{[r]})({\bf{X}} - {\bf{\hat{X}}}_{[r]})'\}. \end{equation}\]

The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)

Representing samples

A standard result when \(r=2\) from is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projections of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\). The projections are therefore,

\[\begin{equation} {\bf{X}} {\bf{V}}_2. \end{equation}\] These projections are also known as the first two principal components.

Representing variables

The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.

Calibrated biplot axes

We have constructed a biplot, but the variables represented by the vectors (arrows) have no calibration.

That meaning, there are no markers on the vectors representing the variables analogous to ordinary scatterplots.

To construct a biplot axis with relevant markers for a variable, a \((p-1)\)-dimensional hyperplane \(\mathscr{N}\) perpendicular to the Cartesian axis is required.

First plane

From the data, \(p = 3\) therefore, a two-dimensional hyperplane is constructed perpendicular to \(X_1\) through a specific value of \(X_1\), say \(\mu\).

The intersection of \(\mathscr{L}\) and \(\mathscr{N}\) is an \((r-1)\)-dimensional intersection space, which in this case will be indicated by a line. All the points on this intersection line in \(\mathscr{L}\) will predict the value for \(\mu\) for the \(X_1\)-axis.

Second plane

The plane \(\mathscr{N}\) is shifted orthogonally through another value on \(X_1\) and all the points on the intersection line of \(\mathscr{L}\) and \(\mathscr{N}\) will predict that value that the plane goes through.

Intersection lines

As the plane \(\mathscr{N}\) is shifted along the \(X_1\)-axis, a series of parallel intersection spaces is obtained.

Any line passing through the origin will pass through these intersection spaces and can be used as an axis fitted with markers according to the value associated with the particular intersection space.

To facilitate orthogonal projection onto the axis, similar to an ordinary scatterplot, the line orthogonal to these intersection spaces is chosen.

PCA Biplot

biplot(X) |> PCA() |> plot()

Installing biplotEZ

# Install from GitHub
library(devtools)
install_github("MuViSU/biplotEZ_beta")

library(biplotEZ)

# Getting help                 
help(package="biplotEZ")

# helpfiles for specific functions
?biplotEZ::biplot
?biplotEZ::PCA
# ...

General notes

  • Use the native R pipe: |>
  • Since %>% is not required, you do not have to load:
library(magrittr)
  • To set |> as your default, change the keyboard shortcut Ctrl-Shift-M:

Global options

Global options

Functions in current version

  • Use the following function to obtain a list of available functions in a package: \(~\)
lsf.str("package:biplotEZ")
# alpha.bags : function (bp, alpha = 0.95, which = NULL, col = ez.col, lty = 1, lwd = 1, 
#     max = 2500)  
# axes : function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, label.cex = 0.75, 
#     label.dist = 0, ticks = 5, tick.col = col, tick.size = 1, tick.label = TRUE, 
#     tick.label.col = tick.col, tick.label.cex = 0.6, tick.label.side = "left", 
#     tick.label.offset = 0.5, tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0)  
# biplot : function (data, group.aes = NULL, center = TRUE, scaled = FALSE, Title = NULL)  
# concentration.ellipse : function (bp, df = 2, kappa = NULL, which = NULL, alpha = 0.95, col = ez.col, 
#     lty = 1, lwd = 1, alpha.transparency = 0.5)  
# CVA : function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), group.aes = bp$group.aes, 
#     weightedCVA = "weighted", ...)  
# indmat : function (groep.vec)  
# legend.type : function (bp, samples = FALSE, means = FALSE, bags = FALSE, ellipses = FALSE, 
#     new = FALSE, ...)  
# PCA : function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), group.aes = NULL, 
#     correlation.biplot = FALSE, ...)  
# samples : function (bp, col = ez.col, pch = 3, cex = 1, label = FALSE, label.cex = 0.75, 
#     label.side = "bottom", connected = FALSE, alpha = 1)

Flow of functions

  • Start with biplot().
  • Use the preferred biplot: PCA() or CVA(), available in current version.
    • Format samples() and axes().
    • Add alpha.bags() or concentration.ellipse().
    • Add a legend.type().
  • To visualise, end with plot().

Use pipe ( |> ) between each subsequent function. The order of samples(), axes(), alpha.bags(), concentration.ellipse() and legend.type() can be interchanged.

biplot() - arguments

# function (data, group.aes = NULL, center = TRUE, scaled = FALSE, 
#     Title = NULL) 
# NULL
Argument Description
data A dataframe or matrix containing all variables the user wants to analyse.
group.aes Variable from the data to be used as a grouping variable.
center TRUE or FALSE
scaled TRUE or FALSE
Title Title of the biplot to be rendered.

biplot() - print

out <- biplot(data = iris)
out
# Object of class biplot, based on 150 samples and 5 variables.

biplot() - values

out <- biplot(iris[,1:4], group.aes=iris[,5])
str(out)
# List of 14
#  $ X        : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat     : NULL
#  $ raw.X    :'data.frame':    150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action: NULL
#  $ center   : logi TRUE
#  $ scaled   : logi FALSE
#  $ means    : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd       : num [1:4] 1 1 1 1
#  $ n        : int 150
#  $ p        : int 4
#  $ group.aes: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names  : chr [1:3] "setosa" "versicolor" "virginica"
#  $ g        : int 3
#  $ Title    : NULL
#  - attr(*, "class")= chr "biplot"

PCA() - arguments

# function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), 
#     group.aes = NULL, correlation.biplot = FALSE, ...) 
# NULL
Argument Description
bp Object of class biplot.
dim.biplot Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects Which eigenvectors (principal components) to extract, with default 1:dim.biplot.

PCA() - arguments

# function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), 
#     group.aes = NULL, correlation.biplot = FALSE, ...) 
# NULL
Argument Description
group.aes Optional vector of the same length as the number of rows in the data matrix for differentiated aesthetics for samples.
correlation.biplot If FALSE, the distances between sample points are optimally approximated in the biplot. If TRUE, the correlations between variables are optimally approximated by the cosine of the angles between axes. Deafult is FALSE.

PCA() - arguments

out <- biplot(iris[,1:4]) |> PCA()
out
# Object of class biplot, based on 150 samples and 4 variables.

PCA() - values

str(out)
# List of 18
#  $ X          : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat       : NULL
#  $ raw.X      :'data.frame':  150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action  : NULL
#  $ center     : logi TRUE
#  $ scaled     : logi FALSE
#  $ means      : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd         : num [1:4] 1 1 1 1
#  $ n          : int 150
#  $ p          : int 4
#  $ group.aes  : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names    : chr "1"
#  $ g          : int 1
#  $ Title      : NULL
#  $ Z          : num [1:150, 1:2] -2.68 -2.71 -2.89 -2.75 -2.73 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#  $ Vr         : num [1:4, 1:2] 0.3614 -0.0845 0.8567 0.3583 -0.6566 ...
#  $ Xhat       : num [1:150, 1:4] 5.08 4.75 4.7 4.64 5.07 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#   ..- attr(*, "scaled:center")= Named num [1:4] -5.84 -3.06 -3.76 -1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ ax.one.unit: num [1:4, 1:2] 0.643 -0.156 1.121 2.672 -1.169 ...
#  - attr(*, "class")= chr [1:2] "biplot" "PCA"

PCA() with plot() - simple

biplot(iris[,1:4]) |> PCA() |> plot()

PCA() with plot() - groups

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> plot()

samples() - arguments

# function (bp, col = ez.col, pch = 3, cex = 1, label = FALSE, 
#     label.cex = 0.75, label.side = "bottom", connected = FALSE, 
#     alpha = 1) 
# NULL
Argument Description
bp Object of class biplot.
col Sample colour.
pch Sample plotting character.
cex Sample character expansion.
label Logical argument, whether samples should be labelled or not, with default FALSE.

samples() - arguments

# function (bp, col = ez.col, pch = 3, cex = 1, label = FALSE, 
#     label.cex = 0.75, label.side = "bottom", connected = FALSE, 
#     alpha = 1) 
# NULL
Argument Description
label.cex Label text expansion.
label.side Side of the plotting character where label appears, with default “bottom”.
connected Logical argument, whether samples are connected in order of rows of data matrix, with default “FALSE”.
alpha Opacity, with default 1.

samples() - values

out <- biplot(iris[,1:4]) |> PCA() |> samples()
str(out)
# List of 19
#  $ X          : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat       : NULL
#  $ raw.X      :'data.frame':  150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action  : NULL
#  $ center     : logi TRUE
#  $ scaled     : logi FALSE
#  $ means      : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd         : num [1:4] 1 1 1 1
#  $ n          : int 150
#  $ p          : int 4
#  $ group.aes  : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names    : chr "1"
#  $ g          : int 1
#  $ Title      : NULL
#  $ Z          : num [1:150, 1:2] -2.68 -2.71 -2.89 -2.75 -2.73 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#  $ Vr         : num [1:4, 1:2] 0.3614 -0.0845 0.8567 0.3583 -0.6566 ...
#  $ Xhat       : num [1:150, 1:4] 5.08 4.75 4.7 4.64 5.07 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#   ..- attr(*, "scaled:center")= Named num [1:4] -5.84 -3.06 -3.76 -1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ ax.one.unit: num [1:4, 1:2] 0.643 -0.156 1.121 2.672 -1.169 ...
#  $ samples    :List of 9
#   ..$ col       : chr "blue"
#   ..$ pch       : num 3
#   ..$ cex       : num 1
#   ..$ label     : logi FALSE
#   ..$ label.cex : num 0.75
#   ..$ label.side: chr "bottom"
#   ..$ connected : logi FALSE
#   ..$ alpha     : num 1
#   ..$ g         : int 1
#  - attr(*, "class")= chr [1:2] "biplot" "PCA"

PCA(), samples() with plot()

biplot(iris[,1:4]) |> PCA() |> samples(col="purple",pch=15) |> plot()

PCA(), samples() with plot()

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> samples(label = TRUE, pch = "", 
col=c("purple","magenta","blue")) |> plot()

axes() - arguments

# function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, 
#     label.cex = 0.75, label.dist = 0, ticks = 5, tick.col = col, 
#     tick.size = 1, tick.label = TRUE, tick.label.col = tick.col, 
#     tick.label.cex = 0.6, tick.label.side = "left", tick.label.offset = 0.5, 
#     tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0) 
# NULL
Argument Description
bp Object of class biplot.
X.names Data column names for which axes should be constructed.
which Column numbers for which axes should be constructed.
col Axis colours.
lwd Axis line widths.

axes() - arguments

# function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, 
#     label.cex = 0.75, label.dist = 0, ticks = 5, tick.col = col, 
#     tick.size = 1, tick.label = TRUE, tick.label.col = tick.col, 
#     tick.label.cex = 0.6, tick.label.side = "left", tick.label.offset = 0.5, 
#     tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0) 
# NULL
Argument Description
lty Axis line types.
label.dir Direction of the axis label. “Orthog”, “Hor” or “Paral”, with default “Orthog”.
label.col Colour of the axis label.
label.cex Character expansion of the axis label.
label.dist Distance of the axis label from the plot.

axes() - arguments

# function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, 
#     label.cex = 0.75, label.dist = 0, ticks = 5, tick.col = col, 
#     tick.size = 1, tick.label = TRUE, tick.label.col = tick.col, 
#     tick.label.cex = 0.6, tick.label.side = "left", tick.label.offset = 0.5, 
#     tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0) 
# NULL
Argument Description
ticks Indicator of the number of tick marks on each axis, with default 5.
tick.col Colour of tick marks.
tick.size Size of tick marks.
tick.label Logical whether tick marks should be labelled, with default TRUE.
tick.label.col Colour of tick mark labels.

axes() - arguments

# function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, 
#     label.cex = 0.75, label.dist = 0, ticks = 5, tick.col = col, 
#     tick.size = 1, tick.label = TRUE, tick.label.col = tick.col, 
#     tick.label.cex = 0.6, tick.label.side = "left", tick.label.offset = 0.5, 
#     tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0) 
# NULL
Argument Description
tick.label.cex Character expansion of of tick mark labels.
tick.label.side On which side of tick marks label should appear, with default “left”.
tick.label.offset Offset for position of tick mark labels.
tick.label.pos pos argument for text function for tick mark labels, with default 1.
predict.col Colour of lines showing orthogonal projection onto axes.

axes() - arguments

# function (bp, X.names = colnames(bp$X), which = 1:bp$p, col = grey(0.7), 
#     lwd = 1, lty = 1, label.dir = "Orthog", label.col = col, 
#     label.cex = 0.75, label.dist = 0, ticks = 5, tick.col = col, 
#     tick.size = 1, tick.label = TRUE, tick.label.col = tick.col, 
#     tick.label.cex = 0.6, tick.label.side = "left", tick.label.offset = 0.5, 
#     tick.label.pos = 1, predict.col = col, predict.lwd = lwd, 
#     predict.lty = lty, ax.names = X.names, orthogx = 0, orthogy = 0) 
# NULL
Argument Description
predict.lwd Line width of lines showing orthogonal projection onto axes.
predict.lty Line type of lines showing ortghonoal projection onto axes.
ax.names User specified alternative names for variables.
orthogx Horizontal translation.
orthogy Vertical translation.

axes() - values

out <- biplot(iris[,1:4]) |> PCA() |> axes()
str(out)
# List of 19
#  $ X          : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat       : NULL
#  $ raw.X      :'data.frame':  150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action  : NULL
#  $ center     : logi TRUE
#  $ scaled     : logi FALSE
#  $ means      : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd         : num [1:4] 1 1 1 1
#  $ n          : int 150
#  $ p          : int 4
#  $ group.aes  : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names    : chr "1"
#  $ g          : int 1
#  $ Title      : NULL
#  $ Z          : num [1:150, 1:2] -2.68 -2.71 -2.89 -2.75 -2.73 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#  $ Vr         : num [1:4, 1:2] 0.3614 -0.0845 0.8567 0.3583 -0.6566 ...
#  $ Xhat       : num [1:150, 1:4] 5.08 4.75 4.7 4.64 5.07 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#   ..- attr(*, "scaled:center")= Named num [1:4] -5.84 -3.06 -3.76 -1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ ax.one.unit: num [1:4, 1:2] 0.643 -0.156 1.121 2.672 -1.169 ...
#  $ axes       :List of 23
#   ..$ which            : int [1:4] 1 2 3 4
#   ..$ col              : chr [1:4] "#B3B3B3" "#B3B3B3" "#B3B3B3" "#B3B3B3"
#   ..$ lwd              : num [1:4] 1 1 1 1
#   ..$ lty              : num [1:4] 1 1 1 1
#   ..$ label.dir        : chr "Orthog"
#   ..$ label.col        : chr [1:4] "#B3B3B3" "#B3B3B3" "#B3B3B3" "#B3B3B3"
#   ..$ label.cex        : num [1:4] 0.75 0.75 0.75 0.75
#   ..$ label.dist       : num [1:4] 0 0 0 0
#   ..$ ticks            : num [1:4] 5 5 5 5
#   ..$ tick.col         : chr [1:4] "#B3B3B3" "#B3B3B3" "#B3B3B3" "#B3B3B3"
#   ..$ tick.size        : num [1:4] 1 1 1 1
#   ..$ tick.label       : logi [1:4] TRUE TRUE TRUE TRUE
#   ..$ tick.label.col   : chr [1:4] "#B3B3B3" "#B3B3B3" "#B3B3B3" "#B3B3B3"
#   ..$ tick.label.cex   : num [1:4] 0.6 0.6 0.6 0.6
#   ..$ tick.label.side  : chr [1:4] "left" "left" "left" "left"
#   ..$ tick.label.offset: num [1:4] 0.5 0.5 0.5 0.5
#   ..$ tick.label.pos   : num [1:4] 1 1 1 1
#   ..$ predict.col      : chr [1:4] "#B3B3B3" "#B3B3B3" "#B3B3B3" "#B3B3B3"
#   ..$ predict.lty      : num [1:4] 1 1 1 1
#   ..$ predict.lwd      : num [1:4] 1 1 1 1
#   ..$ names            : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..$ orthogx          : num [1:4] 0 0 0 0
#   ..$ orthogy          : num [1:4] 0 0 0 0
#  - attr(*, "class")= chr [1:2] "biplot" "PCA"

PCA(), axes() with plot()

biplot(iris[,1:4]) |> PCA() |> axes(col="purple") |> plot()

PCA(), axes() with plot()

biplot(iris[,1:4]) |> PCA() |> axes(which = c(1,2,3)) |> plot()

PCA(), axes() with plot()

biplot(iris[,1:4]) |> PCA() |> axes(which = c(1,2,4)) |> plot()

PCA(), axes() with plot()

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> axes(col="black", tick.col = 
"black", tick.size = 1.2, lwd = 1.2, label.cex = 1, tick.label.cex = 0.7) |> plot()

alpha.bags() - arguments

# function (bp, alpha = 0.95, which = NULL, col = ez.col, lty = 1, 
#     lwd = 1, max = 2500) 
# NULL
Argument Description
bp Object of class biplot.
alpha Value between 0 and 1 to determine coverage of bag, with default 0.95.
which Which groups or classes to be fitted with alpha bags.
col Vector of colours for the alpha bags.

alpha.bags() - arguments

# function (bp, alpha = 0.95, which = NULL, col = ez.col, lty = 1, 
#     lwd = 1, max = 2500) 
# NULL
Argument Description
lty Vector of line types for the alpha bags.
lwd Vector of line widths for the alpha bags.
max Maximum number of samples to include in alpha bag calculation, with default 2500.

alpha.bags() - values

out <- biplot (iris[,1:4],group.aes=iris[,5]) |> PCA() |> alpha.bags(alpha=0.95)
# Computing 0.95 -bag for setosa 
# Computing 0.95 -bag for versicolor 
# Computing 0.95 -bag for virginica
str(out)
# List of 20
#  $ X            : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat         : NULL
#  $ raw.X        :'data.frame':    150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action    : NULL
#  $ center       : logi TRUE
#  $ scaled       : logi FALSE
#  $ means        : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd           : num [1:4] 1 1 1 1
#  $ n            : int 150
#  $ p            : int 4
#  $ group.aes    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names      : chr [1:3] "setosa" "versicolor" "virginica"
#  $ g            : int 3
#  $ Title        : NULL
#  $ Z            : num [1:150, 1:2] -2.68 -2.71 -2.89 -2.75 -2.73 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#  $ Vr           : num [1:4, 1:2] 0.3614 -0.0845 0.8567 0.3583 -0.6566 ...
#  $ Xhat         : num [1:150, 1:4] 5.08 4.75 4.7 4.64 5.07 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#   ..- attr(*, "scaled:center")= Named num [1:4] -5.84 -3.06 -3.76 -1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ ax.one.unit  : num [1:4, 1:2] 0.643 -0.156 1.121 2.672 -1.169 ...
#  $ alpha.bags   :List of 3
#   ..$ setosa-0.95    : num [1:27, 1:2] -2.24 -2.22 -2.22 -2.21 -2.22 ...
#   .. ..- attr(*, "dimnames")=List of 2
#   .. .. ..$ : chr [1:27] "sr" "sr" "sr" "sr" ...
#   .. .. ..$ : NULL
#   ..$ versicolor-0.95: num [1:31, 1:2] 1.43 1.45 1.46 1.52 1.52 ...
#   .. ..- attr(*, "dimnames")=List of 2
#   .. .. ..$ : chr [1:31] "" "" "" "" ...
#   .. .. ..$ : NULL
#   ..$ virginica-0.95 : num [1:19, 1:2] 3.48 3.48 3.62 3.54 3.44 ...
#   .. ..- attr(*, "dimnames")=List of 2
#   .. .. ..$ : chr [1:19] "" "" "" "" ...
#   .. .. ..$ : NULL
#  $ alpha.bag.aes:List of 3
#   ..$ col: chr [1:3] "blue" "green" "gold"
#   ..$ lty: num [1:3] 1 1 1
#   ..$ lwd: num [1:3] 1 1 1
#  - attr(*, "class")= chr [1:2] "biplot" "PCA"

PCA(),alpha.bags() with plot()

biplot (iris[,1:4],group.aes=iris[,5]) |> PCA() |> alpha.bags(alpha=0.95) |> plot()
# Computing 0.95 -bag for setosa 
# Computing 0.95 -bag for versicolor 
# Computing 0.95 -bag for virginica

concentration.ellipse() - arguments

# function (bp, df = 2, kappa = NULL, which = NULL, alpha = 0.95, 
#     col = ez.col, lty = 1, lwd = 1, alpha.transparency = 0.5) 
# NULL
Argument Description
bp Object of class biplot.
df Degrees of freedom, with default 2.
kappa Value to construct kappa-ellipse.
which The selection of the group for ellipse construction.
alpha Size of alpha-bag, with default 0.95.

concentration.ellipse() - arguments

# function (bp, df = 2, kappa = NULL, which = NULL, alpha = 0.95, 
#     col = ez.col, lty = 1, lwd = 1, alpha.transparency = 0.5) 
# NULL
Argument Description
col Colour of ellipse.
lty Line type of ellipse.
lwd Line width of ellipse.
alpha.transparency Level of opacity.

concentration.ellipse() - values

out <- biplot (iris[,1:4]) |> PCA(group.aes=iris[,5]) |> concentration.ellipse(kappa=2) 
# Computing 2 -ellipse for setosa 
# Computing 2 -ellipse for versicolor 
# Computing 2 -ellipse for virginica
str(out)
# List of 20
#  $ X               : num [1:150, 1:4] -0.743 -0.943 -1.143 -1.243 -0.843 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#   ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ Xcat            : NULL
#  $ raw.X           :'data.frame': 150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ na.action       : NULL
#  $ center          : logi TRUE
#  $ scaled          : logi FALSE
#  $ means           : Named num [1:4] 5.84 3.06 3.76 1.2
#   ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ sd              : num [1:4] 1 1 1 1
#  $ n               : int 150
#  $ p               : int 4
#  $ group.aes       : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ g.names         : chr [1:3] "setosa" "versicolor" "virginica"
#  $ g               : int 3
#  $ Title           : NULL
#  $ Z               : num [1:150, 1:2] -2.68 -2.71 -2.89 -2.75 -2.73 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#  $ Vr              : num [1:4, 1:2] 0.3614 -0.0845 0.8567 0.3583 -0.6566 ...
#  $ Xhat            : num [1:150, 1:4] 5.08 4.75 4.7 4.64 5.07 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:150] "1" "2" "3" "4" ...
#   .. ..$ : NULL
#   ..- attr(*, "scaled:center")= Named num [1:4] -5.84 -3.06 -3.76 -1.2
#   .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#  $ ax.one.unit     : num [1:4, 1:2] 0.643 -0.156 1.121 2.672 -1.169 ...
#  $ conc.ellipses   :List of 3
#   ..$ setosa-2    : num [1:6284, 1:2] -2.92 -2.92 -2.92 -2.92 -2.92 ...
#   ..$ versicolor-2: num [1:6284, 1:2] -0.631 -0.631 -0.631 -0.631 -0.63 ...
#   ..$ virginica-2 : num [1:6284, 1:2] 0.722 0.722 0.722 0.722 0.723 ...
#  $ conc.ellipse.aes:List of 4
#   ..$ col  : chr [1:3] "blue" "green" "gold"
#   ..$ lty  : num [1:3] 1 1 1
#   ..$ lwd  : num [1:3] 1 1 1
#   ..$ alpha: num [1:3] 0.5 0.5 0.5
#  - attr(*, "class")= chr [1:2] "biplot" "PCA"

PCA(),concentration.ellipse() with plot()

biplot (iris[,1:4]) |> PCA(group.aes=iris[,5]) |> concentration.ellipse(kappa=2) |> plot()
# Computing 2 -ellipse for setosa 
# Computing 2 -ellipse for versicolor 
# Computing 2 -ellipse for virginica

legend.type() - arguments

# function (bp, samples = FALSE, means = FALSE, bags = FALSE, ellipses = FALSE, 
#     new = FALSE, ...) 
# NULL
Argument Description
bp Object of class biplot.
samples Logical argument, whether legend should be printed for samples, with default “FALSE”.
means Logical argument, whether legend should be printed for means, with default “FALSE”.
bags Logical argument, whether legend should be printed for bags, with default “FALSE”.
new Logical argument, whether legend should appear on a new page, with default “FALSE”.

legend.type()

biplot (iris[,1:4], Title="Test biplot") |> PCA(group.aes = iris[,5]) |>
    legend.type(samples=TRUE) |> plot()

Canonical Variate Analysis & biplots

  • Discriminant Analysis is a predictive method that focuses on the relationship between a data matrix of predictor variables and a response variable.
  • Its primary goal is to optimally separate different groups of objects to provide a classification rule for assigning entities of unknown origin to one of a known set of groups.
  • In Linear Discriminant Analysis (LDA), researchers attempt to find linear combinations of the predictor variables that maximise the ratio of between-class variance to within-class variance.
  • CVA is a direct application of LDA, making a CVA biplot a graphical representation of LDA.
  • The goal is to discover a mapping that can transform each column of matrix \(\pmb{X}\) into a column within a lower-dimensional space, while exploring the cluster structure of the original data.

Constructing the CVA biplot

  • The goal of CVA is to find a set of \(p\) linear combinations of \(\pmb{X}\), \({\bf{m}}_1, {\bf{m}}_2, \ldots, {\bf{m}}_p\) that maximise the between-group variation and minimise the within-group variation.
  • These linear combinations are known as canonical variates.
  • The within-group variation can be measured by the sum of squared deviations from their respective means, \({\bf{W}} = {\bf{X}}'{\bf{X}} - \bar{\bf{X}}'{\bf{N}}\bar{\bf{X}}\).
  • The between-group variation can be measured by the sum of squared deviations between the group means, \({\bf{B}} = \bar{\bf{X}}' {\bf{N}} \bar{\bf{X}}\).
  • where \({\bf{N}} = diag(n_1,n_2, \ldots, n_K)\)
  • The linear combination which optimally seperates the observations from the different groups with respect to the \(n\) samples is defined by the coefficient vector \(\pmb{m}\) which maximises the variance ratio: \(\lambda = \frac{{\bf{m}}' {\bf{B}}{\bf{m}}}{{\bf{m}}' {\bf{W}} {\bf{m}}}\).

Representing samples

\[{\bf{X}}{\bf{M}}\]

where \({\bf{M}} = {\bf{L V}}\)

\({\bf{L}} = {\bf{W}}^{-\frac{1}{2}}\)

\({\bf{V}}\) are the eigenvectors of \({\bf{W}}^{-\frac{1}{2}} {\bf{B}} {\bf{W}}^{-\frac{1}{2}}\)

Representing variables

Calibrated biplot axes

CVA() - arguments

# function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), 
#     group.aes = bp$group.aes, weightedCVA = "weighted", ...) 
# NULL
Argument Description
bp Object of class biplot obtained from preceding function biplot().
dim.biplot Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects Which eigenvectors (principal components) to extract, with default 1:dim.biplot.

CVA() - arguments

# function (bp, dim.biplot = c(2, 1, 3), e.vects = 1:ncol(bp$X), 
#     group.aes = bp$group.aes, weightedCVA = "weighted", ...) 
# NULL
Argument Description
group.aes Vector of the same length as the number of rows in the data matrix for differentiated aesthetics for samples.
weightedCVA The default is “weighted”, specifying a weighted CVA to be performed. Other possible values are “unweightedI” and “unweightedCent”.

CVA() with plot() - groups

biplot(iris[,1:4], group.aes = iris[,5]) |> CVA() |> plot()